SQL enhancements to support text queries on speech recognition results of audio data

ABSTRACT

A system, method, computer program product, and application program interface for indexing data relating to results of speech recognition in a database management system provides the capability to perform simple and efficient searches on audio speech data with reduced development effort. An application program interface for indexing data relating to results of speech recognition in a database management system comprises an indextype operable to support text queries on speech recognition results, an interface operable to provide interaction with an index of the indextype, and a format adapter interface a format adapter that the index creation activity will invoke to extract relevant information from a proprietary speech recognition format.

CROSS-REFERENCE TO RELATED APPLICATIONS

The benefit under 35 U.S.C. § 119(e) of provisional application Ser. No.60/419,520, filed October 21, 2002, is hereby claimed.

FIELD OF THE INVENTION

The present invention relates to a system, method, computer programproduct, and application program interface for indexing data relating toresults of speech recognition in a database management system.

BACKGROUND OF THE INVENTION

Speech recognition technology provides the capability to design computersystems that can recognize spoken words. Speech recognition systemsaccept audio speech data, which are digitized audio speech signals, andoutput textual information. A number of speech recognition systems areavailable on the market. The most powerful can recognize thousands ofwords. However, they generally require an extended training sessionduring which the computer system becomes accustomed to a particularvoice and accent. Such systems are said to be speaker dependent. Morerecently speech recognition systems have been developed that canrecognize speech without being trained using a particular voice andaccent. Such systems may recognize the speech of most or any speakers,and are said to be speaker independent.

Audio speech data may be treated like any other data and stored andorganized in a database. In the case of textual or numeric data,searches may be readily performed on the data by a database managementsystem for the database. However, unlike textual or numeric data, thereis no simple and efficient way to search audio speech data. Priorsystems required developers who wished to search audio speech data hadto develop complex software procedures in order to perform thesearching. For example, to perform a typical search, a user will want toknow which audio or video assets satisfy given text query searchcriteria, the time offsets within each matched media asset where matchesoccurred, and the user may want to know the speech recognitionconfidence of each match. Conventionally, this required development ofsoftware to perform several iterations of extracting the relevant text,time offset, and confidence data from the speech recognition results,build appropriate B-tree indices on this extracted data, and associatetime offsets and confidence values. with their corresponding text data.In addition, procedures would have to be developed that would use theindex and search through the text data for matched rows, and then searchthrough the matched rows for time offsets into the media asset wherematches occurred.

What is needed is a technique by which simple and efficient searches maybe performed on audio speech data and which provides reduced developmenteffort.

SUMMARY OF THE INVENTION

The present invention provides the capability to perform simple andefficient searches on audio speech data with reduced development effort.According to one embodiment of the present invention, an applicationprogram interface for indexing data relating to results of speechrecognition in a database management system comprises an indextypeoperable to support text queries on speech recognition results, aninterface operable to provide interaction with an index of theindextype, and a format adapter interface operable to invoke a formatadapter for converting speech recognition results having a first formatto a second format.

The format adapter may be operable to parse the speech recognitionresults in the first format, extract from the speech recognition resultstext data representing the recognized speech, information relating to aconfidence in each speech recognition result, and timestamp informationindicating a location of each portion of a speech recognition result,and generate speech recognition results in the second format using theextracted text data representing the recognized speech, informationrelating to a confidence in each speech recognition result, andtimestamp information indicating a location of each portion of a speechrecognition result.

The indextype may comprise the text data representing the recognizedspeech, the information relating to a confidence in each speechrecognition result, and the timestamp information indicating a locationof each portion of a speech recognition result. The interface may beoperable to provide interaction comprising performing a query of thetext data representing the recognized speech. The query of the text datarepresenting the recognized speech relates to the confidence informationand/or the timestamp information. The results of the query may indicatetime offsets within each matched media asset where matches occurred andspeech recognition confidence of each match occurrence within a matchedmedia asset.

BRIEF DESCRIPTION OF THE DRAWINGS

The details of the present invention, both as to its structure andoperation, can best be understood by referring to the accompanyingdrawings, in which like reference numbers and designations refer to likeelements.

FIG. 1 is an exemplary dataflow diagram of speech indexing processingperformed in the present invention.

FIG. 2 is a block diagram of an exemplary implementation of a databasemanagement system, in which the present invention may be implemented.

FIG. 3 is an exemplary flow diagram of a process of operation of thepresent invention.

FIG. 4 is an exemplary format of data table that may be used in thepresent invention.

FIG. 5 is an exemplary code sample of how an application would invokespeech recognition on a particular row and populate the result column.

FIG. 6 is an example of an SQL command to build an index on the resultcolumn.

FIG. 7 is an example of an SQL command to create and pass preferences asarguments to index creation.

FIG. 8 is an example of a simple query on the data table, which makesuse of the index.

FIG. 9 is an example of a query that retrieves confidence and timestampsfor each occurrence within a matched audio asset row.

FIG. 10 is an example of an interface to a format adapter shown in FIG.1, which is a proprietary format understanding procedure that extractsthe information required for creating an index of the required indextypefrom a proprietary audio processing result format of a speechrecognition engine.

FIG. 11 is an algorithmic description of an exemplary implementation ofthe proprietary format understanding procedure.

DETAILED DESCRIPTION OF THE INVENTION

An exemplary dataflow diagram of speech indexing processing performed inthe present invention is shown in FIG. 1. Included in FIG. 1 aredatabase management system (DBMS) 102, speech recognition engine 104,and speech query requestor 106. Speech query requester 106 may be anydatabase client, tool, or application that wants to issue text querieson audio speech data.

Database management system (DBMS) 102 provides the capability to store,organize, modify, and extract information from one or more databasesincluded in DBMS 102. From a technical standpoint, DBMSs can differwidely. The terms relational, network, flat, and hierarchical all referto the way a DBMS organizes information internally. The internalorganization can affect how quickly and flexibly you can extractinformation.

Each database included in DBMS 102 includes a collection of informationorganized in such a way that computer software can select and retrievedesired pieces of data. Traditional databases are organized by fields,records, and files. A field is a single piece of information; a recordis one complete set of fields; and a file is a collection of records. Analternative concept in database design is known as Hypertext. In aHypertext database, any object, whether it be a piece of text, apicture, or a film, can be linked to any other object. Hypertextdatabases are particularly useful for organizing large amounts ofdisparate information, but they are not designed for numerical analysis.

Typically, a database includes not only data, but also low-leveldatabase management functions, which perform accesses to the databaseand store or retrieve data from the database. Such functions are oftentermed queries and are performed by using a database query language,such as Structured Query Language (SQL). SQL is a standardized querylanguage for requesting information from a database. Historically, SQLhas been a popular query language for database management systemsrunning on minicomputers and mainframes. Increasingly, however, SQL isbeing supported by personal computer database systems because itsupports distributed databases (databases that are spread out overseveral computer systems). This enables several users on a local-areanetwork to access the same database simultaneously.

Most full-scale database systems are relational database systems. Smalldatabase systems, however, use other designs that provide lessflexibility in posing queries. Relational databases are powerful becausethey require few assumptions about how data is related or how it will beextracted from the database. As a result, the same database can beviewed in many different ways. An important feature of relationalsystems is that a single database can be spread across several tables.This differs from flat-file databases, in which each database isself-contained in a single table.

DBMS 102 may also include one or more database applications, which aresoftware that implements a particular set of functions that utilize oneor more databases. Examples of database applications include:

-   -   computerized library systems    -   automated teller machines    -   flight reservation systems    -   computerized parts inventory systems

Typically, a database application, includes data entry functions anddata reporting functions. Data entry functions provide the capability toenter data into a database. Data entry may be performed manually, bydata entry personnel, automatically, by data entry processing softwarethat receives data from connected sources of data, or by a combinationof manual and automated data entry techniques. Data reporting functionsprovide the capability to select and retrieve data from a database andto process and format that data for other uses. Typically, retrieveddata is used to display information to a user, but retrieved data mayalso be used for other functions, such as account settlement, automatedordering, numerical machine control, etc.

DBMS 102 includes speech enhancements 108, format adapter 110, datatable 112 and speech indexing processing 114. Speech enhancements 108are extensions to the standard query language of DBMS 102. For example,where DBMS 102 uses SQL, speech enhancements include extensions to thecommand set of SQL, an indextype, and its associated operators and typesto empower applications with sophisticated text querying capabilities onaudio data.

Speech recognition engine 104 provides speech recognition processingfunctionality to DBMS 102. Speech recognition engine 104 is typicallyconfigured as a server communicatively connected to DBMS 102.Preferably, speech recognition engine 104 provides large vocabularycontinuous speech recognition (LVCSR) services to DBMS 102. Essentially,speech recognition engine 104 receives data that represents digitizedspeech, processes the data to recognize the speech, and outputs textdata that represents the speech, which is the speech recognition result.The speech recognition results are placed in the CLOB (Character LargeObject) result column in the data table 112 the procedure that invokedthe speech recognition processing. This procedure places the result in aCLOB column in data table 112 next to the audio data. When a CreateIndex command is issued on this CLOB column, Speech Indexing processing114 is invoked, which in turn invokes format adapter 110. Typically, thespeech recognition result is arranged in a proprietary format. Formatadapter 110 adapts the format of the speech recognition result generatedby speech recognition engine 104 to the format used for speech indexing.Format adapter 110 parses the speech recognition result and extracts therequired information. In particular, format adapter 110 extracts text,confidence, and timestamp tuples from each speech recognition result.

Speech indexing processing 114 receives the text, confidence, andtimestamp tuples extracted from the proprietary format of each speechrecognition result by format adapter 110, stores the extractedinformation in its own internal data structures and creates an index ofthe required indextype based on the extracted data. When an index of therequired indextype is created or updated, speech indexing processing 114is invoked for each new or updated row in data table 112. The row data,which are extracted from the speech recognition results, along with atable name and key to the original row in the indexed table, areprovided as parameters. The routine must process the speech recognitionresult to extract <text, timestamp, confidence> and insert this data,along with some additional computed data (character offset and sequencenumber), and the key supplied as a parameter to the procedure into athat is part of an index internal data structure. Speech indexingprocessing 114 then inserts the extracted tuples of information intoindex data structures that are stored independently from the table uponwhich the index is built.

An example of an interface 1100 to format adapter 110 and speechindexing processing 114 is shown in FIG. 10 An example of animplementation of format adapter 110 is shown in FIG. 11.

A block diagram of an exemplary implementation of a DBMS 102, in whichthe present invention may be implemented, is shown in FIG. 2. DBMS 102is typically a programmed general-purpose computer system, such as apersonal computer, workstation, server system, and minicomputer ormainframe computer. DBMS 102 includes one or more processors (CPUs)202A-202N, input/output circuitry 204, network adapter 206, and memory208. CPUs 202A-202N execute program instructions in order to carry outthe functions of the present invention. Typically, CPUs 202A-202N areone or more microprocessors, such as an INTEL PENTIUM® processor. FIG. 2illustrates an embodiment in which DBMS 102 is implemented as a singlemulti-processor computer system, in which multiple processors 202A-202Nshare system resources, such as memory 208, input/output circuitry 204,and network adapter 206. However, the present invention alsocontemplates embodiments in which DBMS 102 is implemented as a pluralityof networked computer systems, which may be single-processor computersystems, multi-processor computer systems, or a mix thereof.

Input/output circuitry 204 provides the capability to input data to, oroutput data from, DBMS 102. For example, input/output circuitry mayinclude input devices, such as keyboards, mice, touchpads, trackballs,scanners, etc., output devices, such as video adapters, monitors,printers, etc., and input/output devices, such as, modems, etc. Networkadapter 206 interfaces DBMS 102 with network 210. Network 210 mayinclude one or more standard local area networks (LAN) or wide areanetworks (WAN), such as Ethernet, Token Ring, the Internet, or a privateor proprietary LAN/WAN.

Memory 208 stores program instructions that are executed by, and datathat are used and processed by, CPU 202 to perform the functions of DBMS102. Memory 208 may include electronic memory devices, such asrandom-access memory (RAM), read-only memory (ROM), programmableread-only memory (PROM), electrically erasable programmable read-onlymemory (EEPROM), flash memory, etc., and electromechanical memory, suchas magnetic disk drives, tape drives, optical disk drives, etc., whichmay use an integrated drive electronics (IDE) interface, or a variationor enhancement thereof, such as enhanced IDE (EIDE) or ultra directmemory access (UDMA), or a small computer system interface (SCSI) basedinterface, or a variation or enhancement thereof, such as fast-SCSI,wide-SCSI, fast and wide-SCSI, etc, or a fiber channel-arbitrated loop(FC-AL) interface.

In the example shown in FIG. 2, memory 208 includes database managementroutines 212, database 214, and operating system 216. Databasemanagement routines 212 include software routines that provide thedatabase management functionality of DBMS 102. Database managementroutines 212 include SQL interface with speech enhancements 108, formatadapter 110, and speech indexing processing 114. SQL interface 108accepts database queries using the SQL database query language, convertsthe queries to a series of database access commands, calls databaseprocessing routines to perform the series of database access commands,and returns the results of the query to the source of the query. Forexample, in an embodiment in which DBMS 102 is a proprietary DBMS, suchas the ORACLE® DBMS, SQL interface 108 may support one or moreparticular versions of SQL or extensions to SQL, such as the ORACLE®PL/SQL extension to SQL. Speech enhancements are extension to thestandard query language of DBMS 102. For example, where DBMS 102 usesSQL, speech enhancements include extensions to the command set of SQL,an indextype, and its associated operators and types to empowerapplications with sophisticated text querying capabilities on audiodata.

Format adapter 110 processes the speech recognition result from speechrecognition engine 104. Typically, the speech recognition result isarranged in a proprietary format. Format adapter 110 adapts the formatof the speech recognition result generated by speech recognition engine104 to the format used for speech indexing. Format adapter 110 parsesthe speech recognition result and extracts the required information. Inparticular, format adapter 110 extracts text, confidence, and timestamptuples from each speech recognition result.

Speech indexing processing 114 receives. the text, confidence, andtimestamp tuples extracted from the proprietary format of each speechrecognition result by format adapter 110, stores the extractedinformation in its own internal data structures and creates an index ofthe required indextype based on the extracted data. When an index of therequired indextype is created or updated, speech indexing processing 114is invoked for each new or updated row in data table 112. The row data,which are extracted from the speech recognition results, along with atable name and key to the original row in the indexed table, areprovided as parameters. The routine must process the speech recognitionresult to extract <text, timestamp, confidence> and insert this data,along with some additional computed data (character offset and sequencenumber), and the key supplied as a parameter to the procedure into athat is part of an index internal data structure. Speech indexingprocessing 114 then inserts the extracted tuples of information intoindex data structures that are stored independently from the table uponwhich the index is built. Database 214 includes a collection ofinformation organized in such a way that computer software can select,store, and retrieve desired pieces of data. Typically, database 214includes a plurality of data tables, such as data table 112. Data table112 is arranged to store audio speech data that has been or is to beprocessed by speech recognition engine 104, shown in FIG. 1, speechrecognition processing results output by speech recognition engine 104.Preferably, indexing information is kept in internal data structures,not in the same data table that stores the media data and speechrecognition results. Typically, a user of the system would store mediaassets in data table 112.

In addition, as shown in FIG. 2, the present invention contemplatesimplementation on a system or systems that provide multi-processor,multi-tasking, multi-process, and/or multi-thread computing, as well asimplementation on systems that provide only single processor, singlethread computing. Multi-processor computing involves performingcomputing using more than one processor. Multi-tasking computinginvolves performing computing using more than one operating system task.A task is an operating system concept that refers to the combination ofa program being executed and bookkeeping information used by theoperating system. Whenever a program is executed, the operating systemcreates a new task for it. The task is like an envelope for the programin that it identifies the program with a task number and attaches otherbookkeeping information to it. Many operating systems, including UNIX®,OS/2®, and WINDOWS®, are capable of running many tasks at the same timeand are called multitasking operating systems. Multi-tasking is theability of an operating system to execute more than one executable atthe same time. Each executable is running in its own address space,meaning that the executables have no way to share any of their memory.This has advantages, because it is impossible for any program to damagethe execution of any of the other programs running on the system.However, the programs have no way to exchange any information exceptthrough the operating system (or by reading files stored on the filesystem). Multi-process computing is similar to multi-tasking computing,as the terms task and process are often used interchangeably, althoughsome operating systems make a distinction between the two.

An exemplary flow diagram of a typical process 300 of operation of adatabase management system incorporating the present invention is shownin FIG. 3. It is best viewed in conjunction with FIG. 1. Process 300begins with step 302, in which media content is uploaded into a databasetable in DBMS 102, such as data table. In particular, media contentincludes audio speech data, which are digitized audio speech signals. Instep 304, a speech recognition processing requestor 106, such as anapplication, that wants to process audio data with speech recognitionengine 104 invokes the appropriate speech recognition. This causes thespeech recognition engine 104, which is waiting for speech processingrequests to receive a request for speech recognition. The receivedrequests are processed by interface 116 of speech recognition engine104. Speech recognition engine 104 processes the speech data in thisrequest in order to recognize the speech and generate text datarepresenting the recognized speech.

In step 308, format adapter 110 adapts the format of the speechrecognition result generated by speech recognition engine 104 to theformat used for speech indexing. Format adapter 110 parses the speechrecognition result and extracts the required information. In particular,format adapter 110 extracts text, confidence, and timestamp tuples fromeach speech recognition result. Then, speech indexing processing 114receives the text, confidence, and timestamp tuples extracted from theproprietary format of each speech recognition result by format adapter110, inserts the extracted data in to data table 112 and creates anindex of the required indextype based on the inserted data. In oneembodiment shown in FIG. 1, the extracted text, confidence, andtimestamp tuples from format adapter 110 are passed directly to speechindexing processing 114 for index creation. In other embodiments, theextracted text, confidence, and timestamp tuples from format adapter 110may be stored before being passed o speech indexing processing 114 forindex creation. When an index of the required indextype is created orupdated, speech indexing processing 114 is invoked for each new orupdated row in data table 112. The row data, which are extracted fromthe speech recognition results, along with a table name and are providedas parameters. This routine must process the data to extract <text,timestamp, confidence> tuples and insert them into data table 112.Speech indexing processing 114 then inserts the extracted tuples ofinformation into index data structures associated.

In step 308, speech query requestor 206 generates a query on the textdata included in data table 112 and transmits the query to DBMS 102. Thegenerated query utilizes speech enhancements 108 to the query languageused by DBMS 102. In step 310, DBMS 102 performs the query by accessingdata table 112 and, using the index, retrieves the specifiedinformation, and returning the results of the query to speech queryrequester 106.

Following is an exemplary description of a sample usage scenario thatwill demonstrate the power and ease of use of the speech indexingfunctionality provided by the present invention.

Imagine a scenario in which a customer wants to do the following:

-   -   1. Upload media content including audio into DBMS 102.    -   2. Process the audio by sending data to a previously started        speech recognition engine 104 and store the results in DBMS 102.    -   3. Create an index on the speech recognition results that will        allow for sophisticated text querying capabilities.    -   4. Query the data to retrieve matched rows along with time        offset and speech recognition confidence pairs for each        occurrence within a matched row.

The customer stores media content including audio in data table 112 inDBMS 102. An exemplary format of data table 112 is shown in FIG. 4. Datatable 112 includes id column 402, audio data column 404, and resultcolumn 406. For each row of data in data table 112, id column 402includes a unique identifier if the data in the row, audio data column404 includes the actual audio data, and result column 406 includes thespeech recognition results. An example 500 of how an application wouldinvoke speech recognition on a particular row and populate the resultcolumn 406 is shown in FIG. 5.

After the application has processed each audio asset in data table 112and populated the result column 406, it is now ready to build an indexon the result column, for example, using an SQL command 600, such asthat shown in FIG. 6. For enhanced text queries that need to take intoaccount customized preferences, such as lexer and wordlist preferences,the application can create preferences using the and pass thosepreferences as arguments to index creation, for example, using an SQLcommand 700, such as that shown in FIG. 7.

An example 800 of a simple query on the data table 112 is shown in FIG.8. An example 900 of a more sophisticated query that that retrievesconfidence and timestamps for each occurrence within a matched audioasset row is shown in FIG. 9. In this example, the SpeechContainsoperator matches those rows that satisfy the input query while theancillary operator SpeechConfidenceTimestamp returns the correspondingcollection of confidence/timestamp pairs for each returned row.

Format adapter 110 must be provided to adapt the format of the speechrecognition result generated by speech recognition engine 104 to theformat used for speech indexing. The formatting procedure must extractthe information required for creating an index of the required indextypefrom the proprietary audio processing result format of speechrecognition engine 104. In one embodiment, when an index of the requiredindextype is created or updated, format adapter 110 is invoked for eachnew or updated row in the indexed table. The row data, which areprocessing results of SpcechMining, along with a table name and key tothe original row in the indexed table, are provided as parameters. Thisroutine must process the data to extract <text, timestamp, confidence>tuples. An example of the interface 100 to format adapter 110 is shownin FIG. 10. An example of the processing 1100 performed by formatadapter 110 is shown in FIG. 11.

APPENDIX A OPERATOR: SpeechContains SignatureSpeechContains(indexed_column CLOB, query_string VARCHAR2,[reference_label NUMBER]) RETURN NUMBER; Description Use theSpeechContains operator in the WHERE clause of a SELECT statement tospecify the query expression for a SpeechIndexing query. SpeechContainsreturns a relevance score for every row selected. You obtain this scorewith the SpeechScore operator. Additionally, SpeechConfidenceTimestampreturns tuples of speech recognition confidences and time offsets forthe matches in the selected row. Parameters indexed_column: Specify theCLOB column to be searched on. This column must have anordsys.ORDSpeechIndex index associated with it. query_string: Specifythe query that defines your search in indexed_column. Oracle Text queryoperators can be used in this query string. reference_label: Optionallyspecify the label that associates the SpeechScore andSpeechConfidenceTimestamp generated by the SpeechContains operator.Returns For each row selected, SpeechContains returns a number between 0and 100 that indicates how relevant the document row is to the query.The number 0 means that Oracle found no matches in the row. Example Thefollowing example searches for all documents in the SpeechMining_resultcolumn that contain the word ‘oracle’. The score for each row isselected with the SpeechScore operator using a label of 1: SELECTordsys.SpeechScore(1), title FROM audionews WHEREordsys.SpeechContains(SpeechMining_result, ‘oracle’, 1) > 0; OPERATOR:SpeechScore Signature SpeechScore(reference_label IN NUMBER) RETURNNUMBER; Description Use the SpeechScore operator in a SELECT statementto return the score values produced by SpeechContains in anSpeechIndexing query. Parameters reference_label: An integer that refersto the corresponding invocation of SpeechContains. If there are multipleinvocations of SpeechContains in the same query, this parameter is usedto maintain the reference. Notes The SpeechScore operator can be used ina SELECT, ORDER BY, or GROUP BY clause. Returns This operator returns aNUMBER. Example See the example for SpeechContains OPERATOR:SpeechConfidenceTimestamp SignatureSpeechConfidenceTimestamp(reference_label IN NUMBER) RETURNordsys.ORDConfidenceTimestampTable; Description Use theSpeechCoinfidenceTimestamp operator in a SELECT statement to return acollection of confidence and timestamp pairs produced by SpeechContainsin an SpeechIndexing query. Parameters reference_label: An integer thatrefers to the corresponding invocation of SpeechContains. If there aremultiple invocations of SpeechContains in the same query, this parameteris used to maintain the reference. Notes The SpeechConfidenceTimestampoperator can be used in a SELECT clause. Returns This operator returns atable of type ordsys.ORDConfidenceTimestampTable (defined below).INDEXTYPE: ORDSpeechIndex Description This indextype allows a user tocreate an audio index on a CLOB column that contains the results ofSpeechMining. Parameters parameter_string: Can be used to pass in OracleText preferences to the underlying Oracle Text index. Note thatdatastore preferences are disallowed. The types below are used toretrieve speech recognition confidence and timestamp values from thequery into PL/SQL variables. OBJECT ORDConfidenceTimestampTuple CREATETYPE ORDConfidenceTimestampTuple AS OBJECT (confidence NUMBER, timestampNUMBER); OBJECT ORDConfidenceTimestampTable CREATE TYPEORDConfidenceTimestampTable AS TABLE OF ORDConfidenceTimestampTuple;

1. A method for indexing data relating to results of speech recognitionin a database management system, comprising the steps of: receivingspeech recognition results at the database management system, the speechrecognition results having a first format; converting the first formatof the speech recognition results to a second format; and generating anindex of the speech recognition results in the database managementsystem.
 2. The method of claim 1, wherein the converting step comprisesthe steps of: parsing the speech recognition results in the firstformat; extracting from the speech recognition results text datarepresenting the recognized speech, information relating to a confidencein each speech recognition result, and timestamp information indicatinga location of each portion of a speech recognition result; and generatespeech recognition results in the second format using the extracted textdata representing the recognized speech, information relating to aconfidence in each speech recognition result, and timestamp informationindicating a location of each portion of a speech recognition result. 3.The method of claim 2, wherein the second format is a standardizedformat.
 4. The method of claim 3, wherein the first format is aproprietary format.
 5. The method of claim 2, wherein the generatingstep comprises the steps of: generating an index using the extractedspeech recognition results, including the text data representing therecognized speech, the information relating to a confidence in eachspeech recognition result, and the timestamp information indicating alocation of each portion of a speech recognition result in the databasemanagement system; and storing the extracted information.
 6. The methodof claim 5, wherein the second format is a standardized format.
 7. Themethod of claim 6, wherein the first format is a proprietary format. 8.A system for indexing data relating to results of speech recognition ina database management system comprising: a processor operable to executecomputer program instructions; a memory operable to store computerprogram instructions executable by the processor; and computer programinstructions stored in the memory and executable to perform the stepsof: receiving speech recognition results at the database managementsystem, the speech recognition results having a first format; convertingthe first format of the speech recognition results to a second format;and generating an index of the speech recognition results in thedatabase management system.
 9. The system of claim 8, wherein theconverting step comprises the steps of: parsing the speech recognitionresults in the first format; extracting from the speech recognitionresults text data representing the recognized speech, informationrelating to a confidence in each speech recognition result, andtimestamp information indicating a location of each portion of a speechrecognition result; and generate speech recognition results in thesecond format using the extracted text data representing the recognizedspeech, information relating to a confidence in each speech recognitionresult, and timestamp information indicating a location of each portionof a speech recognition result.
 10. The system of claim 9, wherein thesecond format is a standardized format.
 11. The system of claim 10,wherein the first format is a proprietary format.
 12. The system ofclaim 9, wherein the generating step comprises the steps of: generatingan index using the extracted speech recognition results, including thetext data representing the recognized speech, the information relatingto a confidence in each speech recognition result, and the timestampinformation indicating a location of each portion of a speechrecognition result in the database management system; and storing theextracted information.
 13. The system of claim 12, wherein the secondformat is a standardized format.
 14. The system of claim 13, wherein thefirst format is a proprietary format.
 15. A computer program product forindexing data relating to results of speech recognition in a databasemanagement system comprising: a computer readable medium; computerprogram instructions, recorded on the computer readable medium,executable by a processor, for performing the steps of receiving speechrecognition results at the database management system, the speechrecognition results having a first format; converting the first formatof the speech recognition results to a second format; generating anindex of the speech recognition results in the database managementsystem.
 16. The computer program product of claim 15, wherein theconverting step comprises the steps of: parsing the speech recognitionresults in the first format; extracting from the speech recognitionresults text data representing the recognized speech, informationrelating to a confidence in each speech recognition result, andtimestamp information indicating a location of each portion of a speechrecognition result; and generate speech recognition results in thesecond format using the extracted text data representing the recognizedspeech, information relating to a confidence in each speech recognitionresult, and tirnestamp information indicating a location of each portionof a speech recognition result.
 17. The computer program product ofclaim 16, wherein the second format is a standardized format.
 18. Thecomputer program product of claim 17, wherein the first format is aproprietary format.
 19. The computer program product of claim 16,wherein the generating step comprises the steps of: generating an indexusing the extracted speech recognition results, including the text datarepresenting the recognized speech, the information relating to aconfidence in each speech recognition result, and the timestampinformation indicating a location of each portion of a speechrecognition result in the database management system; and storing theextracted information.
 20. The computer program product of claim 19,wherein the second format is a standardized format.
 21. The computerprogram product of claim 20, wherein the first format is a proprietaryformat.
 22. An application program interface for indexing data relatingto results of speech recognition in a database management systemcomprising: an indextype operable to support text queries on speechrecognition results; an interface operable to provide interaction withan index of the indextype; and a format adapter interface operable toinvoke a format adapter for converting speech recognition results havinga first format to a second format.
 23. The application program interfaceof claim 22, wherein the format adapter is operable to parse the speechrecognition results in the first format, extract from the speechrecognition results text data representing the recognized speech,information relating to a confidence in each speech recognition result,and timestamp information indicating a location of each portion of aspeech recognition result, and generate speech recognition results inthe second format using the extracted text data representing therecognized speech, information relating to a confidence in each speechrecognition result, and timestamp information indicating a location ofeach portion of a speech recognition result.
 24. The application programinterface of claim 23, wherein the indextype comprises the text datarepresenting the recognized speech, the information relating to aconfidence in each speech recognition result, and the timestampinformation indicating a location of each portion of a speechrecognition result in the database management system.
 25. Theapplication program interface of claim 24, wherein the interface isoperable to provide interaction comprising performing a query of thetext data representing the recognized speech.
 26. The applicationprogram interface of claim 25, wherein the query of the text datarepresenting the recognized speech relates to the confidence informationand/or the timestamp information.
 27. The application program interfaceof claim 26, wherein results of the query indicate time offsets withineach matched media asset where matches occurred and speech recognitionconfidence of each match occurrence within a matched media asset.